home *** CD-ROM | disk | FTP | other *** search
-
- Back To The Basics
- by SPo0ky
-
-
- I wrote this tutorial because some beginners who read the first two
- editions of our magazine told us that they have problems understanding the
- basics of assembly... In this tutorial I'll not try to teach you any virus
- techniques, I'll only try to explain the most basic things of assembly like
- how tasm/tlink work, registers, memory, interrupts,... as simple as
- possible.
- Maybe this sounds boring to you but if you want to code your own viruses
- you have to understand the basics.
- Also this article will not fully teach you assembly! It will only help you
- to understand the basics. To understand assembly you will need at least a
- few weeks with good training (-programming). I also suggest you to buy a
- good book about Assembler and to read many many many virus source codes
- (Thats the way I used to learn Assembler).
- (Also - If you don't have a bookstore in your town you can use the online
- bookstore at http://intertain.com, they have many great books, and they
- are fast and cheap!)
-
-
- 1. Why Assembler?
-
- There are some pros and cons why you should (should not) use Assembler.
- Contrary to HLL (High Level Languages), like Pascal, Basic or C++, in
- Assembler you have to tell the CPU each step it has to execute,
- which means that to write a big (complex) Assembler program is very time
- consuming. Thats why most of the time, big programs are not completely
- written in Assembler but Assembler parts are included in HLL-programs.
- Another con of Assembler is that programs can not be used on other brands of
- CPUs which they were written on because each brand of CPU has another
- instruction set (We will use the 80x86 instruction set, which is used in
- the IBM-PC and compatibles), but this gives you the possibility to optimize
- the code for one specific CPU so that you can use all of its capabilities.
- The result is extremely FAST and SMALL code.
-
-
- 2. A simple assembly program
-
- Lets start with a short Assembler program. You don't have to understand
- what each instruction is used for now, I'll explain that later.
- Just type this program into an ascii editor (-edit.com) and save it as
- example.asm.
-
- .model tiny
- .code
- org 100h
-
- start:
- mov ah,9h
- mov dx,offset message
- int 21h
-
- mov ax,4c00h
- int 21h
-
- message db 'CodeBreakers Rule! ;-)',10,13,'$'
- end start
-
-
- 3. Assembler and Linker (TASM and TLINK)
-
- Before I continue I'll show you how to use TASM and TLINK to compile such
- a file to an EXE or COM program.
-
- After you saved the above program into a file (example.asm) you can type
-
- TASM EXAMPLE1.ASM
-
- This will generate a so called OBJECT-FILE named EXAMPLE.OBJ.
- Generally this file contains only information for the linker and a
- translated version of the above code into binary (machine code).
-
- Example:
-
- MOV AH,9h
-
- would be translated into 1011 0100 0000 1001
-
- This Object file is not executable yet, to make an executable file (COM or
- EXE) we have to use a LINKER (TLINK).
- This linker will stick one or more object files together to one executable
- file.
- To link the example1.obj file you just type:
-
- TLINK /t EXAMPLE1.OBJ
-
- The /t switch tells the linker that it should produce a COM file, if you
- leave /t away you will get an EXE file. Anyway, now you should have a ready
- to run COM file... Just type EXAMPLE to start it!
-
-
- 4. Registers
-
- Registers are extremely fast accessible memory cells in the CPU, they are
- used to address memory, to give instructions to the CPU,... generally they
- are used to store "values".
- All registers can store 16 bits (= 2 bytes) of data and some registers can
- be split into two 8 bit (= 1 byte) registers.
-
- Registers of the 8086 CPU's:
-
- +--------------------+
- | AH | AL > AX | -> Accumulator Register
- | BH | BL > BX | -> Base Register
- | CH | CL > CX | -> Count Register
- | DH | DL > DX | -> Data Register
- +-----+--------------+
- | SI | -> Source Index
- | DI | -> Destination Index
- +--------------------+
- | BP | -> Base Pointer
- | SP | -> Stack Pointer
- +--------------------+
- | CS | -> Code Segment
- | DS | -> Data Segment
- | ES | -> Extra Segment
- | SS | -> Stack Segment
- +--------------------+
- | IP | -> Instruction Pointer
- +--------------------+
- | F | -> Flag-Register
- +--------------------+
-
- Only the registers AX, BX, CX and DX can be split into two parts:
- AH, AL, BH, BL, CH, CL, DH, DL. Each of them has only 8 bits instead of
- 16!
-
- BTW - 8 bits are called a BYTE
- 16 bits are called a WORD
- AH, AL, BH, BL, and so on, are called byte register and all others
- (AX, BX, CX, DX,...) are called WORD registers.
-
-
- 5. MOV(e)
-
- Assembler (or the CPU) provides many functions which allow you to
- manipulate (to change) the data stored in a register. One of the most
- important instructions used to manipulate a register is MOV.
- Look back to our example program... we used MOV 3 times (MOV AH,9 /
- MOV DX, OFFSET MESSAGE / MOV AX,4C00H).
- You can change the data in a register by using: MOV <REG>, <DATA>. Where
- <REG> is the 16 or 8 bit register you want to change, and <DATA> is the
- data you wan to store in the register.
-
- But MOV can't be only used to change the data of registers, it can also
- be used to change data stored at a certain location in MEMORY.
-
- 6. MEMORY
-
- When you execute our little example program it just displays text... now,
- how can the computer know WHICH text it should display? Again, look at the
- example programs source code:
-
- MOV AH, 9
- MOV DX, OFFSET MESSAGE
- INT 21H
- .
- .
- MESSAGE DB 'Some text...$'
-
- The first line is used to tell the CPU that it should display text
- (function 9 in AH is used to display text). And the second line tells the
- CPU where it has to look for the text in memory.
- Each byte in memory gets a number (an address), so the CPU knows exactly
- which byte it has to read/write.
- In the second line, "OFFSET MESSAGE" would return the address where the
- MESSAGE is stored in memory and store it in DX (To display text MS-DOS
- requires that the address of the text is stored in the register DX!).
-
- Some examples:
-
- Let's say this is our memory:
-
- Offset: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
- Data: | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P |
-
- We want to get the data which is stored at offset 6 into register AH.
- Which instruction would be used?
-
- -> MOV AH, [6]
-
- This would put the data at offset 6 (= 'F') into register AH. The '[' and
- ']' are very important, If you forget them it would put the number 6 into
- AH instead of 'F'!
-
- Remember, AH is a 8 bit register (-> It can store only 1 byte or 8 bit),
- what would happen if we'd use AX (which is a 16 bit register) instead of
- AH?
-
- -> MOV AX, [6]
-
- AX would become 'GF'. Yes, not 'FG'! In the x86's everything you read from
- memory into a word register is turned around! (Thats not that important for
- you yet, but you should know it anyway...)
-
- 7. Interrupts
-
- I needed a very long time until I found a (hopefully) good way to explain
- interrupts to a newbie! Finally I decided to use a simple example, the
- MS-DOS Prompt.
- When you are at the MS-DOS prompt you can enter commands, after you press
- RETURN the command gets executed. You could compare the pressing of the
- RETURN key with an interrupt. In assembler you fill the registers with
- values, then you execute the interrupt. The interrupt code would then
- evaluate the values you put into the register, it would decide which
- function it should execute, .... and finally it would return the results
- (in a register, on the screen or on your hdd,...)
-
- Ex.:
- MOV AH,9
- MOV DX,OFFSET MESSAGE
- INT 21H
-
- The first two lines of this example have already been explained above,
- the 3rd line would execute an interrupt, interrupt 21h(ex).
- Interrupts are numbered from 0 to FFh, each interrupt provides other types
- of 'services'. Like
- INT 21H, this is the MS-DOS interrupt, it provides basic DOS functions,
- like input/output of text, file functions (like open, read, write to
- files).
- INT 13H is the BIOS interrupt, it provides many Disk access functions
- like reading/writing/formatting of disk sectors.
- INT 10H is the video interrupt, this interrupt allows you to use many
- functions to make nice graphics :-) It provides functions to change the
- video mode, to draw pixels onto the screen, to change the color of text,
- and so on...
- For a list of all interrupts and their functions download Ralf Browns
- Interrupt List from http://www.cs.cmu.edu/afs/cs.cmu.edu/user/ralf/pub/WWW/
-
- 8. The sample program, step-by-step
-
- .model tiny
- -----------
- This is not code which will later be executed... it just tells TASM/TLINK
- that they should use one segment for the whole program. There are also
- .model small, .model huge, etc,... but for small programs (= simple
- viruses ;) model tiny is enough.
-
- .code
- -----
- This tells TASM and TLINK that our executable code begin here. After this
- like we can begin to write our main program.
-
- org 100h
- --------
- All COM files are loaded into memory at offset 100h. ORG tells the
- compiler 'where to store the code in memory'.
-
- start:
- ------
- This is just a lable which is required by TASM...
-
- MOV AH, 9
- ---------
- We want to display text... we will use the dos interrupt to do so.
- INT 21h requires that we put the function number into register AH. So, to
- tell the CPU that we want to display some text we 'MOV(E)' the number of
- the function used to display text (9) into register AH.
-
- MOV DX, OFFSET MESSAGE
- ----------------------
- The CPU needs to know where to find the text it should display... If we
- use INT 21H we have to store this location in register DX. To do so we
- just get the address of the message with 'OFFSET MESSAGE' and move it into
- DX.
-
- INT 21H
- -------
- Now that we have 'collected' enough information (filled the registers
- with many stupid numbers) we can execute the interrupt, which will finally
- get the CPU to display some text for us. :-)
-
- MOV AX,4C00H
- ------------
- Never forget the last two lines of this code! They are used to exit a
- program. If you forget them, your program will crash.
- DOS uses the function 4C to exit programs, 00 means that we will not
- return an ¿Error Code¿ (exit without an error).
-
- INT 21H
- -------
- Now INT 21H will execute the function to exit the program...
-
- message db 'CodeBreakers Rule! ;-)',10,13,'$'
- ------------------------------------------------
- This is the message we WANT to display :-)
- '10,13' does the same as pressing return, it puts the cursor into the next
- line.
- '$', this sign doesn't mean 'fast money' :) ... somehow DOS needs to know
- where to stop displaying text, Bill G. decided to use '$'.
-
- end start
- ---------
- This indicates the end of the label 'start', also required by TASM.
- btw - This doesn't exit the program! To exit the program you still have to
- use function 4C with INT 21h.
-
-
- Ok, thats all for now... I know that this is a very basic tutorial, and
- that it wasn't written very well... but I hope that it answered at least
- a few of your questions! If you have any further questions feel free to use
- the message board on our homepage at www.codebreakers.org, or email me at
- spo0ky@thepentagon.com (spo0ky with zero! :)... Maybe I'll write a FAQ with
- specific questions for the 4th edition.
-
- --SPo0ky
-